Exact Exponent in Optimal Rates for Crowdsourcing
نویسندگان
چکیده
In many machine learning applications, crowdsourcing has become the primary means for label collection. In this paper, we study the optimal error rate for aggregating labels provided by a set of non-expert workers. Under the classic Dawid-Skene model, we establish matching upper and lower bounds with an exact exponent mI(π) in which m is the number of workers and I(π) the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m > 1 I(π) log 1 in order to achieve an misclassification error. In addition, our results imply the optimality of various EM algorithms for crowdsourcing initialized by consistent estimators.
منابع مشابه
IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Simplified Erasure/List Decoding
We consider the problem of erasure/list decoding using certain classes of simplified decoders. Specifically, we assume a class of erasure/list decoders, such that a codeword is in the list if its likelihood is larger than a threshold. This class of decoders both approximates the optimal decoder of Forney, and also includes the following simplified subclasses of decoding rules: The first is a fu...
متن کاملIRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Exact Random Coding Error Exponents of Optimal Bin Index Decoding
We consider ensembles of channel codes that are partitioned into bins, and focus on analysis of exact random coding error exponents associated with optimum decoding of the index of the bin to which the transmitted codeword belongs. Two main conclusions arise from this analysis: (i) for independent random selection of codewords within a given type class, the random coding exponent of optimal bin...
متن کاملQuantum Sphere-Packing Bounds with Polynomial Prefactors
We study lower bounds on the optimal error probability in classical coding over classicalquantum channels at rates below the capacity, commonly termed quantum sphere-packing bounds. Winter and Dalai have derived such bounds for classical-quantum channels; however, the exponents in their bounds only coincide when the channel is classical. In this paper, we show that these two exponents admit a v...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملOptimum Tradeoffs Between the Error Exponent and the Excess-Rate Exponent of Variable-Rate Slepian-Wolf Coding
We analyze the asymptotic performance of ensembles of random binning Slepian-Wolf codes, where each type class of the source might have a different coding rate. In particular, we first provide the exact encoder excess rate exponent as well as the decoder error exponent. Then, using the error exponent expression, we determine the optimal rate function, namely, the minimal rate for each type clas...
متن کامل